Entity Typing Using Distributional Semantics and DBpedia
نویسندگان
چکیده
Recognising entities in a text and linking them to an external resource is a vital step in creating a structured resource (e.g. a knowledge base) from text. This allows semantic querying over a dataset, for example selecting all politicians or football players. However, traditional named entity recognition systems only distinguish a limited number of entity types (such as Person, Organisation and Location) and entity linking has the limitation that often not all entities found in a text can be linked to a knowledge base. This creates a gap in coverage between what is in the text and what can be annotated with fine grained types. This paper presents an approach to detect entity types using DBpedia type information and distributional semantics. The distributional semantics paradigm assumes that similar words occur in similar contexts. We exploit this by comparing entities with an unknown type to entities for which the type is known and assign the type of the most similar set of entities to the entity with the unknown type. We demonstrate our approach on seven different named entity linking datasets. To the best of our knowledge, our approach is the first to combine word embeddings with external type information for this task. Our results show that this task is challenging but not impossible and performance improves when narrowing the search space by adding more context to the entities in the form of topic information.
منابع مشابه
Building a Fine-Grained Entity Typing System Overnight for a New X (X = Language, Domain, Genre)
Recent research has shown great progress on fine-grained entity typing. Most existing methods require pre-defining a set of types and training a multi-class classifier from a large labeled data set based on multi-level linguistic features. They are thus limited to certain domains, genres and languages. In this paper, we propose a novel unsupervised entity typing framework by combining symbolic ...
متن کاملLiberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.
The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics...
متن کاملAutomatic Typing of DBpedia Entities
We present Tı̀palo, an algorithm and tool for automatically typing DBpedia entities. T̀ıpalo identifies the most appropriate types for an entity by interpreting its natural language definition, which is extracted from its corresponding Wikipedia page abstract. Types are identified by means of a set of heuristics based on graph patterns, disambiguated to WordNet, and aligned to two top-level ontol...
متن کاملConceptNet 5.5: An Open Multilingual Graph of General Knowledge
Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges...
متن کاملUncovering the Semantics of Wikipedia Pagelinks
Wikipedia pagelinks, i.e. links between Wikipages, carry an intended semantics: they indicate the existence of a factual relation between the DBpedia entity referenced to by the source Wikipage, and the DBpedia entity referenced to by the target Wikipage of the link. These relations are represented in DBpedia as triple occurrences of a generic ”wikiPageWikilinks” property. We designed and imple...
متن کامل